Clustering WordNet Senses Utilizing Modified and Novel Similarity Metrics CS 229 Final Project Report

نویسندگان

  • Christopher Thad Hughes
  • Sushant Prakash
چکیده

Introduction We approach the problem of clustering senses in Princeton's WordNet (Fellbaum 1998), a manually created dictionary/thesaurus which attempts to model the structure underlying human concepts. A synset, the fundamental unit in WordNet, is represented by a group of synonyms and a gloss definition, and is connected through a variety of semantic links, such as hypernyms (type-of) or meronyms (part-of), to other synsets. A particular word is associated with one or more synsets, each representing a particular sense of the word. While this electronic database provides an inventory with which to do Word Sense Disambiguation, the fine-grainedness of the senses – which sometimes even humans have trouble distinguishing between has posed a problem in achieving reasonable performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering

In recent years, there has been an increasing interest in learning a distributed representation of word sense. Traditional context clustering based models usually require careful tuning of model parameters, and typically perform worse on infrequent word senses. This paper presents a novel approach which addresses these limitations by first initializing the word sense embeddings through learning...

متن کامل

Meaningful Clusters

We present an approach to the disambiguation of cluster labels that capitalizes on the notion of semantic similarity to assign WordNet senses to cluster labels. The approach provides interesting insights on how document clustering can provide the basis for developing a novel approach to word sense disambiguation.

متن کامل

Semantic Similarity Applied to Spoken Dialogue Summarization

We present a novel approach to spoken dialogue summarization. Our system employs a set of semantic similarity metrics using the noun portion of WordNet as a knowledge source. So far, the noun senses have been disambiguated manually. The algorithm aims to extract utterances carrying the essential content of dialogues. We evaluate the system on 20 Switchboard dialogues. The results show that our ...

متن کامل

A Gloss Composition and Context Clustering Based Distributed Word Sense Representation Model

In recent years, there has been an increasing interest in learning a distributed representation of word sense. Traditional context clustering based models usually require careful tuning of model parameters, and typically perform worse on infrequent word senses. This paper presents a novel approach which addresses these limitations by first initializing the word sense embeddings through learning...

متن کامل

Harnessing WordNet Senses for Supervised Sentiment Classification

Traditional approaches to sentiment classification rely on lexical features, syntax-based features or a combination of the two. We propose semantic features using word senses for a supervised document-level sentiment classifier. To highlight the benefit of sense-based features, we compare word-based representation of documents with a sense-based representation where WordNet senses of the words ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006